Dynamic Testing of Writing

FREMO 2025

Chris Wheadon

No More Marking

Comparative Judgement & The Evolution of Assessing Writing

  • We’ve been running Comparative Judgement assessments of writing for nearly a decade now.
  • Assessed over 3 million pieces of writing
  • Distributed model where teachers across the world judge & moderate their pupils’ writing
  • Statistical model that places all the work on a single scale across time & space

The Early Days: 2016

  • Schools submitted a portfolio of their students’ writing.
  • Students could rework and redraft the submitted writing.

The Problem with the Early Approach

  • Sometimes schools submitted 60 portfolios from 60 different students that were all incredibly similar.
  • Students had so much scaffolded feedback and heavily structured guidance that they ended up creating very similar pieces.
  • This was a problem in both educational & assessment terms.

Why Open-Ended Tasks Matter

  • The reason for open-ended tasks (like extended writing) is to encourage creative and unexpected responses.
  • For clear, specific correct answers, closed questions are better than extended writing.
  • Comparative Judgement assesses holistic quality, not checklist completion.

Shift in Approach: 2018 Onwards

  • Changed assessment: students complete writing in independent conditions.
  • No opportunity for feedback and redrafting.
  • Assessed nearly two million pieces of student writing this way.
  • Resulted in fantastically original and entertaining pieces.

But What About Feedback?

  • This approach works well, but questions about redrafting and editing persist.
  • Teachers emphasize that redrafting, editing, and responding to feedback are:
    • Genuine authentic skills that matter.
    • Skills that should be taught.
    • Skills that should be assessed too.

Previous Stance on Assessing Redrafting

  • Acknowledged the value of these skills.
  • BUT, technical & workload challenges in assessing them formally were considered too great.
  • Suggestion: teach and include in curriculum & lesson plans, but avoid formal assessment.

A New Solution: Dynamic Assessment

  • Now able to use a mix of new AI technology and established psychometric techniques.
  • Developing a new kind of dynamic assessment to address this issue.

What is Dynamic Testing?

  • Allows pupils to receive some form of guidance during an initial testing period.
  • Attempts to measure learning gains that directly result from the guidance.

Dynamic Testing: “Cake” Format

  • Familiar form: nudges and hints in online gamified assessments (e.g., Duolingo, Khan Academy).
  • Process: Given a question, if struggling, receive a staged series of hints and suggestions.
  • Example: “What is 58 + 76?”
    • Hint 1: “Try adding the ones place first: What is 8 + 6?”
    • Hint 2: “8 + 6 = 14. Write down the 4 and carry the 1 to the tens column.”
  • Called “cake” format due to layers of hints.

Dynamic Testing: “Sandwich” Format

  • An alternative form of dynamic testing.
  • Involves:
    1. Pre-test
    2. Tailored instruction
    3. Post-test

Our Plan: Y6 Redrafting Assessment (Sandwich Format)

  • An assessment of Y6 redrafting skills.
  • Here’s how it works.

Sandwich Format: Step 1 - Pre-test

  • Students complete their writing in standardised conditions without assistance.
  • Part of our national projects.
  • Assessed using human teachers making Comparative Judgements.
  • This assessment has already taken place (annual Comparative Judgement Year 6 assessment in Feb/March).

Sandwich Format: Step 2 - Tailored Instruction

Students receive an automated and personalised workbook with four kinds of feedback:

  1. A standardised score (human teachers, CJ process).

  2. A paragraph of written feedback (spoken by human teachers, processed by AI).

  3. A checklist of writing features (directly generated by AI).

  4. A series of multiple choice questions on an aspect of writing, introduced by a short instructional resource. The exact MCQs depend on their standardised score.

Sandwich Format: Step 3 - Post-test Redraft

  • The students rewrite their narrative.
  • It is assessed again using Comparative Judgement.
  • Schools can choose to use AI judgements to supplement or replace human teacher judgement (reduces teacher workload).

Redrafting Conditions

  • Want to guard against the past problem of very similar pieces of work.
  • Conceptualising redrafting not as having one correct end goal.
  • View it as similar to the original writing process: authentic, creative, open to different responses.

Hopes for the New Approach

  • Increase students’ motivation to re-draft.
  • Provide them with useful guidance to improve their work.

Improving Work vs. Thinking

  • Students focus on surface improvements for that specific piece, which don’t update mental models or transfer to future writing.
  • Multiple-choice questions designed to provoke changes in student thinking.
  • Need to track students’ writing again (e.g., pre-test of following year’s assessment) to see if improvements are consolidated.

Two students

  • Do you think their writing improved?
  • Do you think the feedback could have led to the improvement?

Pre

Post

Student A: Addressing Run-On Sentences

  • Example Quiz Question: Caption
  • Action in Redraft: Successfully corrected the run-on sentence in the first paragraph. Caption

Student A: Responding to AI Characterisation Feedback

  • AI-Generated Feedback: Received suggestions regarding characterisation. Caption
  • Creative Response:
    • Altered the story significantly.
    • The mysterious figure was revealed to be the main character’s mother! PLOT TWIST!!

Student A: Responding to AI Vocabulary Feedback

  • AI-Generated Feedback: Try using different words to avoid repeating “sinking”
  • Response:
    • “I was submerging too.”

Student A: Responding to AI Describing Feedback

  • AI-Generated Feedback: Is it hot or cold?
  • Original Response:
    • “The atmosphere was filled with a dark chill.”
  • Redrafted Response:
    • “A cold breeze swept over”

Student A: Impressive Improvement

  • Redrafted Score Improvement: +69 points
  • This is significantly more than the average improvement of 12 points.
  • Conclusion: Student A clearly responded to feedback, and their writing improved as a result.

Pre

Post

Student B: Mixed Results

  • Initial Score: 503
  • Feedback Focus: Identified issues with capital letters.
  • Intervention: Allocated a set of quiz questions on capital letters.

Student B: Addressing Capitalisation

  • Example Quiz Question: Caption
  • Action in Redraft:
    • Amended one capitalisation error: “eiffletower” to “Eiffle tower”.
    • However, other capitalisation errors remained uncorrected.

Student B: Responding to AI Characterisation Feedback

  • AI-Generated Feedback: Received suggestions regarding characterisation. Caption
  • Response:
    • Named the children in the story.
    • However, failed to capitalise their names. (e.g., “helena and bea” instead of “Helena and Bea”)

Student B: Outcome

  • Redrafted Score: Decreased by 21 points compared to the original piece.
  • Conclusion: Made some attempts to respond to feedback but didn’t fully implement changes or address all identified issues, and the overall score did not improve.

Overall Results

What percentage of pupils do you expect to improve their scores?

Overall Results

  • 2,881 pupils
  • Correlation between the scores on the two assessments was 0.67
  • Scores rose by 11 scaled score points from 546 to 557
  • An increase of 11 points on this assessment is a effect size of 0.34
  • 62% saw their scores increase.

Results by Level

level mean pre mean post diff sd pre sd post n
WTS 497 515 19 23 37 696
EXS 553 563 10 15 37 1773
GDS 599 606 7 15 38 412

Not Everyone Improved!

How do teachers help with re-drafting?

  • Reports mean nothing if students don’t engage with them or understand them
  • There is so much detail available that it is possible for them to be used in very different ways

Cynthy Tang of Rose Hill Primary 1

I began by giving pupils the AI student reports and asking them to look at the three areas (capital letters, vocab and run on sentences) they were asked to focus on and to attempt the multiple-choice questions. I went through all three areas together with the whole class and marked them together as I felt that was still beneficial.

Cynthy Tang of Rose Hill Primary 2

I took one pupil’s student report and read out the three improvement/feedback points on the table given by AI Chloe. I asked the class to use talk-partners to discuss what strategies they could use to improve their writing and came up with a class list: ambitious vocab, varied sentence structures, use of our five senses, show-not-tell, punctuation etc.

Cynthy Tang of Rose Hill Primary 3

Alongside this, I had put pupil’s writing into ChatGPT and asked it to provide five new targets for each pupil to focus on. Some of this was spelling, punctuation, comma splices and run on sentences. With ChatGPT, it was able to make further adaptations for lower ability chn by offering specific examples of what they wrote and provide an example of an improved sentence.

Cynthy Tang of Rose Hill Primary 4

I then gave out copies of their original APWs and asked pupils to edit their work with green pen before writing up on APW coded sheets.

What can go wrong with AI!

“What you can do better - always capitalise ocean and crocodile.” [Really?]

“What you can do better - always start a sentence with a capital letter.” [The pupil had]

“What you can do better - make sure you finish each sentence with a full stop.” [The pupil had]

The AI struggled providing feedback on work in which the pupils had done everything correctly.

What can go really wrong with AI!!

“Thank you for watching and don’t forget to like and subscribe!”

“Okay, guys. Bye! Peace.”

What can go really really wrong with AI!!!

___

Conclusion: The Nuances of Feedback

  • Responding to feedback effectively is not always easy.
  • Extended writing has many “moving parts.”
  • Positive: Most students were able to use the feedback to make targeted improvements.
  • Challenge: Some students struggled.
  • Goal: Ensure all students receive feedback that is comprehensible and actionable for them.

Conclusion: Beyond a Single Piece

  • Key Consideration: Do students who improve sustain that improvement in subsequent assessments?
  • Ultimate Aim: The point of feedback is not just to improve the writing, but to improve the student.